114 research outputs found
Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition
Computers can understand and then engage with people in an emotionally
intelligent way thanks to speech-emotion recognition (SER). However, the
performance of SER in cross-corpus and real-world live data feed scenarios can
be significantly improved. The inability to adapt an existing model to a new
domain is one of the shortcomings of SER methods. To address this challenge,
researchers have developed domain adaptation techniques that transfer knowledge
learnt by a model across the domain. Although existing domain adaptation
techniques have improved performances across domains, they can be improved to
adapt to a real-world live data feed situation where a model can self-tune
while deployed. In this paper, we present a deep reinforcement learning-based
strategy (RL-DA) for adapting a pre-trained model to a real-world live data
feed setting while interacting with the environment and collecting continual
feedback. RL-DA is evaluated on SER tasks, including cross-corpus and
cross-language domain adaption schema. Evaluation results show that in a live
data feed setting, RL-DA outperforms a baseline strategy by 11% and 14% in
cross-corpus and cross-language scenarios, respectively
Enhancing Speech Emotion Recognition Through Differentiable Architecture Search
Speech Emotion Recognition (SER) is a critical enabler of emotion-aware
communication in human-computer interactions. Recent advancements in Deep
Learning (DL) have substantially enhanced the performance of SER models through
increased model complexity. However, designing optimal DL architectures
requires prior experience and experimental evaluations. Encouragingly, Neural
Architecture Search (NAS) offers a promising avenue to determine an optimal DL
model automatically. In particular, Differentiable Architecture Search (DARTS)
is an efficient method of using NAS to search for optimised models. This paper
proposes a DARTS-optimised joint CNN and LSTM architecture, to improve SER
performance, where the literature informs the selection of CNN and LSTM
coupling to offer improved performance. While DARTS has previously been applied
to CNN and LSTM combinations, our approach introduces a novel mechanism,
particularly in selecting CNN operations using DARTS. In contrast to previous
studies, we refrain from imposing constraints on the order of the layers for
the CNN within the DARTS cell; instead, we allow DARTS to determine the optimal
layer order autonomously. Experimenting with the IEMOCAP and MSP-IMPROV
datasets, we demonstrate that our proposed methodology achieves significantly
higher SER accuracy than hand-engineering the CNN-LSTM configuration. It also
outperforms the best-reported SER results achieved using DARTS on CNN-LSTM.Comment: 5 pages, 4 figure
Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition
Despite the recent advancement in speech emotion recognition (SER) within a
single corpus setting, the performance of these SER systems degrades
significantly for cross-corpus and cross-language scenarios. The key reason is
the lack of generalisation in SER systems towards unseen conditions, which
causes them to perform poorly in cross-corpus and cross-language settings.
Recent studies focus on utilising adversarial methods to learn domain
generalised representation for improving cross-corpus and cross-language SER to
address this issue. However, many of these methods only focus on cross-corpus
SER without addressing the cross-language SER performance degradation due to a
larger domain gap between source and target language data. This contribution
proposes an adversarial dual discriminator (ADDi) network that uses the
three-players adversarial game to learn generalised representations without
requiring any target data labels. We also introduce a self-supervised ADDi
(sADDi) network that utilises self-supervised pre-training with unlabelled
data. We propose synthetic data generation as a pretext task in sADDi, enabling
the network to produce emotionally discriminative and domain invariant
representations and providing complementary synthetic data to augment the
system. The proposed model is rigorously evaluated using five publicly
available datasets in three languages and compared with multiple studies on
cross-corpus and cross-language SER. Experimental results demonstrate that the
proposed model achieves improved performance compared to the state-of-the-art
methods.Comment: Accepted in IEEE Transactions on Affective Computin
Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition
Despite the recent progress in speech emotion recognition (SER),
state-of-the-art systems lack generalisation across different conditions. A key
underlying reason for poor generalisation is the scarcity of emotion datasets,
which is a significant roadblock to designing robust machine learning (ML)
models. Recent works in SER focus on utilising multitask learning (MTL) methods
to improve generalisation by learning shared representations. However, most of
these studies propose MTL solutions with the requirement of meta labels for
auxiliary tasks, which limits the training of SER systems. This paper proposes
an MTL framework (MTL-AUG) that learns generalised representations from
augmented data. We utilise augmentation-type classification and unsupervised
reconstruction as auxiliary tasks, which allow training SER systems on
augmented data without requiring any meta labels for auxiliary tasks. The
semi-supervised nature of MTL-AUG allows for the exploitation of the abundant
unlabelled data to further boost the performance of SER. We comprehensively
evaluate the proposed framework in the following settings: (1) within corpus,
(2) cross-corpus and cross-language, (3) noisy speech, (4) and adversarial
attacks. Our evaluations using the widely used IEMOCAP, MSP-IMPROV, and EMODB
datasets show improved results compared to existing state-of-the-art methods.Comment: Under review IEEE Transactions on Affective Computin
HER-2 Immunohistochemical Expression in Bone Sarcomas: A New Hope for Osteosarcoma Patients
BACKGROUND: Osteosarcoma and chondrosarcoma, remain the most common primary bone tumours. Questions have been raised about the prognostic influence of HER-2 in bone sarcomas, but so far the results have been debatable. The her-2 expression is possibly a predictor of chemotherapy response.AIM: In this study, we investigated the extent of HER-2 expression in bone sarcomas, and attempted to correlate it with pertinent variables that will help to provide better treatment options, especially for metastatic ones.MATERIAL AND METHODS: Fifty-two cases of bone sarcomas (32 osteosarcoma cases and 20 chondrosarcoma ones) were studied for HER-2 immunohistochemical expression then correlation with all available clinicopathologic features was done.RESULTS: Most of the osteosarcoma cases exhibited membranous staining (78.1%). Strong staining was observed (score 3+) in 34.4%; while 21.9% showed moderate staining (score 2+); and 21.9% displayed weak staining (score 1+), on the other hand, no staining was detected in 7 out of 32 cases (21.9%) (score 0). As regards chondrosarcoma, the absence of staining in all examined cases was noted. Immunohistochemical HER-2 overexpression correlated significantly with osteosarcoma site with P value = 0.004, with variation relating HER-2 intensity score to the site of osteosarcoma (P = 0.051). A statistically significant negative correlation was detected between HER-2 expression and the presence of metastasis at time of diagnosis (P = 0.006), A significant correlation was also found regarding HER-2 score and presence of metastasis with P value = 0.046 as more than half of cases with no metastasis at diagnosis (17/28 cases, 60.7%) showed positive intensity score. A statistically significant correlation was detected between HER-2 expression and patients’ age (P = 0.044). Also, HER-2 expression significantly correlated to histopathological detection of fibrous tissue, with P value = 0.033. Higher scores of HER-2 expression were associated with a significantly better differentiation (P = 0.038) since detection of wide areas of osteoid were associated with higher HER-2 scores.CONCLUSION: Further research would still be needed to delineate HER-2 role being a new hope for therapeutic targeting in bone sarcoma patients, mainly osteosarcoma in contrast to chondrosarcoma that didn’t express HER-2 at all
Towards Optimal Kinetic Energy Harvesting for the Batteryless IoT
Traditional Internet of Things (IoT) sensors rely on batteries that need to
be replaced or recharged frequently which impedes their pervasive deployment. A
promising alternative is to employ energy harvesters that convert the
environmental energy into electrical energy. Kinetic Energy Harvesting (KEH)
converts the ambient motion/vibration energy into electrical energy to power
the IoT sensor nodes. However, most previous works employ KEH without
dynamically tracking the optimal operating point of the transducer for maximum
power output. In this paper, we systematically analyse the relation between the
operating point of the transducer and the corresponding energy yield. To this
end, we explore the voltage-current characteristics of the KEH transducer to
find its Maximum Power Point (MPP). We show how this operating point can be
approximated in a practical energy harvesting circuit. We design two hardware
circuit prototypes to evaluate the performance of the proposed mechanism and
analyse the harvested energy using a precise load shaker under a wide set of
controlled conditions typically found in human-centric applications. We analyse
the dynamic current-voltage characteristics and specify the relation between
the MPP sampling rate and harvesting efficiency which outlines the need for
dynamic MPP tracking. The results show that the proposed energy harvesting
mechanism outperforms the conventional method in terms of generated power and
offers at least one order of magnitude higher power than the latter
Survey of deep representation learning for speech emotion recognition
Traditionally, speech emotion recognition (SER) research has relied on manually handcrafted acoustic features using feature engineering. However, the design of handcrafted features for complex SER tasks requires significant manual eort, which impedes generalisability and slows the pace of innovation. This has motivated the adoption of representation learning techniques that can automatically learn an intermediate representation of the input signal without any manual feature engineering. Representation learning has led to improved SER performance and enabled rapid innovation. Its effectiveness has further increased with advances in deep learning (DL), which has facilitated \textit{deep representation learning} where hierarchical representations are automatically learned in a data-driven manner. This paper presents the first comprehensive survey on the important topic of deep representation learning for SER. We highlight various techniques, related challenges and identify important future areas of research. Our survey bridges the gap in the literature since existing surveys either focus on SER with hand-engineered features or representation learning in the general setting without focusing on SER
- …